Zurich by the Numbers - Predictive Insights into Tourism Dynamic

Authors
Affiliation

Name I, First Name I

University of Lausanne

Name II, First Name II

Published

May 8, 2024

Abstract

The following Forecasting project focuses on applying forecasting techniques to predict tourism trends in Zurich. This analysis aims to harness the power of historical data combined with forecasting algorithms to provide actionable insights into future tourism patterns. We engage in comprehensive data preparation, explore various predictive models, and conduct a detailed evaluation of their forecasting accuracy. The project encapsulates the challenge of turning complex data into understandable and strategic information, crucial for effective decision-making in Zurich’s tourism sector.

1 Exploration & Visualization

1.1 Objectives

The main objectives of this project is to predict :

  • The overnight stays of the visitors in Vaud, from October 2023 until December 2024.

  • The overnight satys of visitors from Philippines to Zürich, from October 2023 until December 2024.

2 DATA

2.1 Cleaning & Wrangling

2.1.1 Tourism Data - General Overview

The dataset contains information about the overnights stays by tourists in the various Swiss cantons. It indicates the tourist’s country of origin, the canton of stay, the month, the year and the total number of overnights stays.

As the dataset provided is in German, we have translated the data in English to make it more intuitive and understandable for everyone. Then, we created a new ‘Date’ column, year-month-day, which corresponds to the correct format to be able to make predictions.

Click to show code
# Load the data in folder data named Dataset_tourism.xlsx)
tourism_data <- readxl::read_xlsx(here("data/Dataset_tourism.xlsx"))

#removing value 'Herkunftsland - Total' in column 'Herkunftsland' as it is just the total
#tourism_data <- tourism_data %>% filter(Herkunftsland != "Herkunftsland - Total")
#print unique values in month column
unique(tourism_data$Monat)
#>  [1] "Januar"    "Februar"   "März"      "April"     "Mai"      
#>  [6] "Juni"      "Juli"      "August"    "September" "Oktober"  
#> [11] "November"  "Dezember"
# change ' [1] "Januar"    "Februar"   "März"      "April"     "Mai"       "Juni"      "Juli"      "August" "September" "Oktober"   "November"  "Dezember" into english month'
tourism_data$Monat <- tourism_data$Monat %>% recode_factor(
  "Januar" = "January",
  "Februar" = "February",
  "März" = "March",
  "April" = "April",
  "Mai" = "May",
  "Juni" = "June",
  "Juli" = "July",
  "August" = "August",
  "September" = "September",
  "Oktober" = "October",
  "November" = "November",
  "Dezember" = "December"
)
#add date type column for plotting purposes
tourism_data <- tourism_data %>% mutate(Date = dmy(paste("01", Monat, Jahr)))
# filtering out 'Herkunftsland - Total' in column 'Herkunftsland' as it is just the total
tourism_data_no_total <- tourism_data %>% filter(Herkunftsland != "Herkunftsland - Total")
#check for NAN
sum(is.na(tourism_data_no_total))
#> [1] 51395
#analyse the NAN values, where are they
(tourism_data_no_total %>% filter(is.na(value)))
#> # A tibble: 51,395 x 6
#>    Herkunftsland                  Kanton Monat Jahr  value Date      
#>    <chr>                          <chr>  <fct> <chr> <dbl> <date>    
#>  1 Malta                          Schwe~ Janu~ 2005     NA 2005-01-01
#>  2 Zypern                         Schwe~ Janu~ 2005     NA 2005-01-01
#>  3 Mexiko                         Schwe~ Janu~ 2005     NA 2005-01-01
#>  4 Übriges Zentralamerika, Karib~ Schwe~ Janu~ 2005     NA 2005-01-01
#>  5 Bahrain                        Schwe~ Janu~ 2005     NA 2005-01-01
#>  6 Katar                          Schwe~ Janu~ 2005     NA 2005-01-01
#>  7 Kuwait                         Schwe~ Janu~ 2005     NA 2005-01-01
#>  8 Australien                     Schwe~ Janu~ 2005     NA 2005-01-01
#>  9 Neuseeland, Ozeanien           Schwe~ Janu~ 2005     NA 2005-01-01
#> 10 Oman                           Schwe~ Janu~ 2005     NA 2005-01-01
#> # i 51,385 more rows
#show data using reactable only showing the first 100 rows
reactable::reactable(head(tourism_data_no_total, 1000), searchable = TRUE)

2.1.2 Tourism Data - Vaud

Click to show code
# Filter by canton Vaud 
tourism_vaud <- tourism_data %>% filter(Kanton == "Vaud")
#check for NAN
sum(is.na(tourism_vaud))
#> [1] 1869
#show the data in a table using reactable
reactable::reactable(head(tourism_vaud, 20))

2.1.3 Tourism Data - Zurich

We filtered the ‘Canton’ column to keep only the canton of Zurich.

Click to show code
#filter column 'Kanton' for Zurich
tourism_data_zurich <- tourism_data_no_total %>% filter(Kanton == "Zürich")
#check for NAN
sum(is.na(tourism_data_zurich))
#> [1] 1869
#analyse the NAN values, where are they
tourism_data_zurich %>% filter(is.na(value))
#> # A tibble: 1,869 x 6
#>    Herkunftsland                  Kanton Monat Jahr  value Date      
#>    <chr>                          <chr>  <fct> <chr> <dbl> <date>    
#>  1 Malta                          Zürich Janu~ 2005     NA 2005-01-01
#>  2 Zypern                         Zürich Janu~ 2005     NA 2005-01-01
#>  3 Mexiko                         Zürich Janu~ 2005     NA 2005-01-01
#>  4 Übriges Zentralamerika, Karib~ Zürich Janu~ 2005     NA 2005-01-01
#>  5 Bahrain                        Zürich Janu~ 2005     NA 2005-01-01
#>  6 Katar                          Zürich Janu~ 2005     NA 2005-01-01
#>  7 Kuwait                         Zürich Janu~ 2005     NA 2005-01-01
#>  8 Australien                     Zürich Janu~ 2005     NA 2005-01-01
#>  9 Neuseeland, Ozeanien           Zürich Janu~ 2005     NA 2005-01-01
#> 10 Oman                           Zürich Janu~ 2005     NA 2005-01-01
#> # i 1,859 more rows

#show the data in a table using reactable
reactable::reactable(head(tourism_data_zurich, 1000))

2.1.4 Tourism Data - Zurich and Philipines

Same as above, but we’re also filtering the country of origin, keeping only ‘Philippinen’ as that is what we’re aiming for.

Click to show code
tourism_data_zurich_philippines <- tourism_data_zurich %>% filter(Herkunftsland == "Philippinen")
#show table using reactable
reactable::reactable(tourism_data_zurich_philippines)

2.1.5 Deal with NAN

A supprimer ou juste mentionner quon a pas de NAN We have none in the data filtered with Zurich and Philippines, but if we would have we would : Impute missing values ARIMA

If the missing values are random or if excluding them would result in a loss of valuable information, we might consider imputing them. One common approach is to use statistical models like ARIMA to interpolate missing values based on the patterns observed in the available data.

Click to show code
# #Creating a tsibble with missing values
# data <- tourism_data_zurich_philippines %>%
#   as_tsibble(key = c(Kanton, Herkunftsland, Monat, Jahr)) %>%
#   select(Date, value) %>%
#   fill_gaps()
# 
# # Fit an ARIMA model to data with missing values
# model_fit <- data %>%
#   model(ARIMA(value))
# 
# # Interpolate missing values using the fitted ARIMA model
# filled_data <- model_fit %>%
#   interpolate(data)
# 
# # Print the data with filled in missing values
# print(filled_data)

3 EDA - Vaud

3.1 Visitors from different countries in Vaud

Click to show code
plot_vaud <- tourism_vaud %>% 
  filter(Herkunftsland != 'Herkunftsland - Total') %>%
  ggplot(aes(x = Date, y = value, group = Herkunftsland, color = Herkunftsland,
                      text = paste("Country:", Herkunftsland, "Trips:", value))) +  # Added text for tooltip
  geom_line(show.legend = FALSE) + 
  labs(title = "Number of visitors from Each Country to Vaud",
       x = "Date",
       y = "Number of Trips") +
  theme_minimal()

# Convert to an interactive plotly object
interactive_plot <- ggplotly(plot_vaud, tooltip = "text")

# Adjust plotly settings 
interactive_plot <- interactive_plot %>%
  layout(margin = list(l = 60, r = 60, b = 60, t = 80),  # Adjust margins
         legend = list(orientation = "h", x = 0, xanchor = "left", y = -0.2))  # Adjust legend position

# Display the interactive plot
interactive_plot

According to the graph the most tourist in canton Vaud are Swiss and France.

Graphical view of total number of tourists in canton Vaud

Click to show code
head(tourism_vaud)
#> # A tibble: 6 x 6
#>   Herkunftsland         Kanton Monat   Jahr  value Date      
#>   <chr>                 <chr>  <fct>   <chr> <dbl> <date>    
#> 1 Herkunftsland - Total Vaud   January 2005  57942 2005-01-01
#> 2 Schweiz               Vaud   January 2005  27853 2005-01-01
#> 3 Baltische Staaten     Vaud   January 2005    103 2005-01-01
#> 4 Deutschland           Vaud   January 2005   3052 2005-01-01
#> 5 Frankreich            Vaud   January 2005   7667 2005-01-01
#> 6 Italien               Vaud   January 2005   2154 2005-01-01
tourism_vaud_total <- tourism_vaud %>%
  filter(Herkunftsland == 'Herkunftsland - Total') %>%
  select(-c(Herkunftsland, Kanton, Monat, Jahr))
tourism_vaud_total%>%
  ggplot(aes(x = Date, y = value)) +
  geom_line() +
  labs(x = "Date", y = "Number of tourists", title = "Total number of tourists in canton Vaud") + 
  theme_minimal()

3.1.1 Decomposition

Click to show code
# Convert data to a time series object
vaud_ts <- tourism_vaud_total %>%
  arrange(Date) %>%
   # Filtre pour enlever les valeurs NA dans 'Date'
  filter(!is.na(Date)) %>%
  # Ensure data is complete and monthly
  complete(Date = seq.Date(min(Date, na.rm = TRUE), max(Date, na.rm = TRUE), by = "month")) %>%
  replace_na(list(value = 0)) %>%  # Replace NA values if there are any
  # Create a time series object
  with(ts(value, frequency = 12, start = decimal_date(min(Date, na.rm = TRUE))))

# Decompose the time series
vaud_ts %>% decompose() %>% plot()

We see clear seasonality with picks in summer and dip in January.

We saw a big drop in March 2020 due to the global coronavirus pandemic.

4 EDA - Zurich

4.1 Zurich and All visiting countries

Click to show code
# Preparing the data
#removing value 'Schweiz' in column 'Herkunftsland' as it is just the whole of Switzerland
data <- tourism_data_zurich %>%
  filter(!is.na(value)) %>%  # Removing rows with NA values in the 'value' column
  mutate(Monat = month(Date, label = TRUE, abbr = TRUE),  # Extract month from Date
         Jahr = year(Date)) %>%  # Extract year from Date
  group_by(Herkunftsland, Date) %>%  # Group by country and date
  summarise(Trips = sum(value), .groups = 'drop')  # Summing up trips for each country per date

p <- ggplot(data, aes(x = Date, y = Trips, group = Herkunftsland,
                      color = Herkunftsland == "Philippinen",
                      text = paste("Country:", Herkunftsland, "<br>Trips:", Trips))) +  # Added text for tooltip
  geom_line(show.legend = FALSE) +
  scale_color_manual(values = c("TRUE" = "red", "FALSE" = "grey")) +
  labs(title = "Number of Trips from Each Country to Zurich",
       x = "Date",
       y = "Number of Trips") +
  theme_minimal()

# Convert to an interactive plotly object
interactive_plot <- ggplotly(p, tooltip = "text")

# Adjust plotly settings 
interactive_plot <- interactive_plot %>%
  layout(margin = list(l = 60, r = 60, b = 60, t = 80),  # Adjust margins
         legend = list(orientation = "h", x = 0, xanchor = "left", y = -0.2),
         width = 600,
         height = 400)  # Adjust legend position
#> Warning: Specifying width/height in layout() is now deprecated.
#> Please specify in ggplotly() or plot_ly()
# Display the interactive plot
interactive_plot

This graph shows changes in the number of overnight stays from January 2005 to September 2023.

The red curve represents the Philippines. It is flat and shows a considerably small number of trips from this country to Zurich over the period. The grey curves represent other countries visiting the canton of Zurich. Swiss people are the most frequent visitors, followed by Germany and the United States.

A drastic drop can be observed. This could be due to the COVID-19 pandemic, which had a significant impact on international travel and thus on tourism industry worldwide.

At first glance, we can observe regular seasonal peaks for most countries, suggesting the presence of seasonality in tourism to the canton of Zurich.

4.2 Zurich and Philippinens Visitors

This graph shows only visitors from the Philippines, as this is the country of interest in our analysis.

Click to show code
# use tourism_data_zurich_philippines data to plot the values in y axis and Date in x axis
p <- ggplot(tourism_data_zurich_philippines, aes(x = Date, y = value)) +
  geom_line() +
  labs(title = "Number of Trips from Philipines to Zurich",
       x = "Date",
       y = "Number of Trips") +
  theme_minimal()
p

  • Slight seasonality

  • General upward trend in the number of nights spent before 2020.

  • Very sharp fall in 2020 - covid

  • General upward trend after the fall

4.2.1 Pattern

4.2.1.1 Decomposition

We apply a standard decomposition of the number of overnight stays in Zurich for the Philippines to reveal seasonal trends, long-term trends and irregular components of the data.

Splitting the times series into several components allows us to understand how the different components contribute to the variations observed in Swiss tourism data.

We chose to split the times series into monthly data.

Click to show code
# Convert data to a time series object
tourism_ts <- tourism_data_zurich_philippines %>%
  arrange(Date) %>%
  # Ensure data is complete and monthly
  complete(Date = seq.Date(min(Date), max(Date), by = "month")) %>%
  replace_na(list(value = 0)) %>%  # Replace NA values if there are any
  # Create a time series object
  with(ts(value, frequency = 12, start = decimal_date(min(Date))))

# Decompose the time series
decomposed <- decompose(tourism_ts)

# Plot the decomposed components
plot(decomposed)

4.2.1.2 Seasonality

Click to show code
# Plot the seasonality in one chart 
ggseasonplot(tourism_ts, year.labels = TRUE, year.labels.left = TRUE)

Click to show code
# several chart per month to see the seasonality
ggsubseriesplot(tourism_ts) + ylab("Number of tourists") + xlab("Month") + ggtitle("Seasonal subseries plot")

#debug
#better to use gg_subseries to see the seasonality
#tourism_ts %>% gg_subseries(value) + ylab("Number of tourists") + xlab("Month") + ggtitle("Seasonal subseries plot")

4.2.1.3 Trend

Juste ecrire text sur le upward trend qu’on a vu dans la decomposition

5 MODELLING

This part is about building on your knowledge of time series techniques to model your data. You can investigate various models but you should justify in your report your choices regarding these. Pay attention to the conditions that are needed to apply a specific model. Treat also carefully seasonality, outliers, colinearity, covariates, special events, etc. Remember the following steps:

  1. Aggregation choice for hierarchical time series
  2. Model building
  3. Model selection

5.1 Total number of visitors in Vaud

5.1.1 Outliers, Correlation, Colinearity, Covariates, Special Events ?

Questions ?

5.1.2 ETS model

Click to show code
ets_vaud <- ets(vaud_ts, model = "AAA")
forecast_ets_vaud <- forecast(ets_vaud, h = 24) %>% plot(main = "Forecast of visitors in Vaud", xlab = "Date", ylab = "Number of visitors")

5.1.3 ARIMA

Click to show code
arima_vaud <- auto.arima(vaud_ts, seasonal = TRUE)

# Generate forecasts for the next 2 years (24 months)
forecast_arima_vaud <- forecast(arima_vaud, h = 24)

# Plot the forecast
plot(forecast_arima_vaud, main = "ARIMA Forecast for Vaud Tourism", xlab = "Date", ylab = "Number of Tourists")

Question: Forecast for each country or general forecast?

5.2 Zurich and Philipines

5.2.1 Forecast without dealing with Covid

5.2.1.1 Naive Forecast

Click to show code
#convert tourism_ts to tsibble
tourism_ts <- tourism_ts %>% as_tsibble()
# Fit a naive model
fit <- tourism_ts %>%
  model(NAIVE = NAIVE(value))
# Forecast the next 2 years periods
forecast <- fit %>%
  forecast(h = 24)
# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposes
plot <- forecast %>%
  autoplot(tourism_ts, alpha = 0.5) +
  labs(title = "Forecast of tourists from Philipines to Zurich",
       x = "Date",
       y = "Number of tourists") + guides(colour = guide_legend(title = "Forecast"))
plot

5.2.1.2 Exponential Smoothing

  • Why additive errors ? Because the variance of the errors is constant over time no ?
Click to show code
# Fit an ETS model
# Adjusting the model parameters according to the characteristics of the data
# Here "A" means additive error, "N" means no trend, and "N" means no seasonality
# change these if needed
fit <- tourism_ts %>%
  model(ETS_M_seaso = ETS(value ~ error("A") + trend("A") + season("A"))) #multiplicative seasonality)
# Forecast the next 2 years periods
forecast <- fit %>%
  forecast(h = 24)
# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposes
plot <- forecast %>%
  autoplot(tourism_ts, alpha = 0.5) +
  labs(title = "Forecast of tourists from Philipines to Zurich",
       x = "Date",
       y = "Number of tourists") + guides(colour = guide_legend(title = "Forecast"))
plot

Clearly see here that the confidence interval is too big, almost like a naive forecast

Why trend dp and seaso M ? - Trend is present so A - Seasonality is present and growing over time so Multiplicative was chosen

Click to show code
# comparing several model
fit <- tourism_ts %>%
  model(
        ETS_M_seaso = ETS(value ~ error("A") + trend("A") + season("M")), #multiplicative seasonality
        ETS_M_seaso_Ad = ETS(value ~ error("A") + trend("Ad") + season("M")), #dampted trend
  )

# Forecast the next 2 years periods
forecast <- fit %>%
  forecast(h = 24)
# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposes
plot <- forecast %>%
  autoplot(tourism_ts, level = 90, color = "blue", alpha = 0.5) +
  labs(title = "Forecast of tourists from Philipines to Zurich",
       x = "Date",
       y = "Number of tourists") + guides(colour = guide_legend(title = "Forecast"))
plot

5.2.1.2.1 Thus Chosen Model :
Click to show code
fit <- tourism_ts %>%
  model(ETS_M_seaso = ETS(value ~ error("A") + trend("Ad") + season("M"))) #multiplicative seasonality)
# Forecast the next 2 years periods
forecast <- fit %>%
  forecast(h = 24)
# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposes
plot <- forecast %>%
  autoplot(tourism_ts, alpha = 0.5) +
  labs(title = "Forecast of tourists from Philipines to Zurich",
       x = "Date",
       y = "Number of tourists") + guides(colour = guide_legend(title = "Forecast"))
plot

5.2.1.3 ARIMA

Question : Do we need to differentiate the data ?

Click to show code
# Fit an automatic ARIMA model
fit_arima <- tourism_ts %>%
  model(ARIMA_auto = ARIMA(value))

# Forecast the next 2 years (24 months)
forecast_arima <- fit_arima %>%
  forecast(h = 24)

# Plot the forecasts along with the historical data
plot_arima <- forecast_arima %>%
  autoplot(tourism_ts, alpha = 0.5) +
  labs(title = "ARIMA Forecast of Tourists from the Philippines to Zurich",
       x = "Date",
       y = "Number of Tourists") +
  guides(colour = guide_legend(title = "Forecast"))
plot_arima

Ugly forecast, confidence interval is too big

5.2.1.4 Diagnostic

Click to show code
# Check the diagnostics of the ARIMA model
report <- fit_arima %>% report()
#> Series: value 
#> Model: ARIMA(0,1,4)(0,0,2)[12] 
#> 
#> Coefficients:
#>           ma1     ma2      ma3     ma4   sma1    sma2
#>       -0.3889  0.0978  -0.1636  -0.155  0.418  0.1582
#> s.e.   0.0693  0.0809   0.0638   0.095  0.086  0.0688
#> 
#> sigma^2 estimated as 13178:  log likelihood=-1379
#> AIC=2771   AICc=2772   BIC=2795

autoplot(residuals(fit_arima))

#show report in html

Click to show code
# using auto.arima with stepwise and approximation options turned off for a more thorough search
fit_updated <- auto.arima(tourism_ts, seasonal = TRUE, stepwise = FALSE, approximation = FALSE)
summary(fit_updated)
#> Series: tourism_ts 
#> ARIMA(4,1,0)(1,0,0)[12] 
#> 
#> Coefficients:
#>          ar1     ar2     ar3     ar4   sar1
#>       -0.407  -0.161  -0.236  -0.270  0.462
#> s.e.   0.064   0.071   0.073   0.066  0.074
#> 
#> sigma^2 = 12436:  log likelihood = -1373
#> AIC=2758   AICc=2758   BIC=2778
#> 
#> Training set error measures:
#>                ME RMSE  MAE   MPE MAPE  MASE   ACF1
#> Training set 5.55  110 66.9 -77.7  147 0.589 -0.041

5.2.2 Forecast but dealing with Covid now

5.2.2.1 Special Event- Covid impact

Explanation

5.2.2.1.1 How philipines dealt with it

In 2021, the Philippines faced the challenges of the COVID-19 pandemic with a mix of resilience and adaptation.

CHANGE THIS TEXT ITS BULLSHIT ***** - Lockdowns and Waves: The country experienced two waves of COVID-19, leading to prolonged lockdowns throughout the year. These restrictions aimed to curb the spread of the virus. - Vaccination Campaign: Despite challenges, millions of Filipinos received COVID-19 vaccines. However, experts raised concerns about the campaign’s sluggish pace. - Senate Investigations: Lawmakers probed alleged anomalies in pandemic contracts, leading to marathon Senate hearings. - Easing Restrictions: Towards the end of the year, daily cases dropped, and mandatory face shield policies were lifted. This signaled progress in overcoming the crisis. - Risk-Based Decisions: While the holidays looked promising, the threat of new variants remained. Experts advised practicing “risk-based” decisions for activities despite low case numbers1. - Filipinos have become more mindful of hygiene practices, including social distancing, mask-wearing, and proper handwashing. The pandemic also prompted a shift in consumption patterns, with increased focus on essentials and at-home entertainment. However, air travel remains limited due to ongoing concerns.

As for travel, the Philippines continues to navigate the balance between safety and economic recovery. While some travel restrictions have eased, travelers must stay informed about evolving guidelines and exercise caution when planning trips. The threat of new variants underscores the need for vigilance and informed decision-making1 ******

Question : How to deal with this blackswan event ?

5.2.2.1.2 Covariates

TEXT A PAUFINER Covariate Adjustment Adjust your forecasts to account for the impact of COVID-19 by including covariates that capture the effects of the pandemic. These covariates can be used to adjust the forecasts based on the observed changes in the data due to COVID-19. For example, you can include a covariate that captures the effect of lockdowns or travel restrictions on tourism data. Scenario Analysis Conduct scenario analysis to explore the potential impact of different COVID-19 scenarios on your forecasts. By considering a range of possible outcomes, you can better prepare for the uncertainties associated with the pandemic. Sensitivity Analysis Evaluate the sensitivity of your forecasts to changes in key assumptions or parameters. By conducting sensitivity analysis, you can identify the factors that have the greatest impact on your forecasts and assess the robustness of your models.

5.2.2.1.2.1 Data Integration

The Oxford COVID-19 Government Response Tracker (OxCGRT) provides valuable data on government responses to the COVID-19 pandemic, including a Stringency Index that quantifies the severity of lockdown measures. Let me provide more details about this index:

Stringency Index: The Stringency Index is a composite measure that evaluates the strictness of government policies related to COVID-19. It calculates a score based on nine key response metrics: - School closures - Workplace closures - Cancellation of public events - Restrictions on public gatherings - Closures of public transport - Stay-at-home requirements - Public information campaigns - Restrictions on internal movements - International travel controls Each metric is assigned a value between 0 and 100, with a higher score indicating a stricter response. The overall Stringency Index is the mean score of these nine metrics. If policies vary at the subnational level, the index reflects the strictest sub-region’s response level. Importantly, the Stringency Index records the strictness of government policies but does not measure or imply the appropriateness or effectiveness of a country’s response. A higher score does not necessarily mean that a country’s response is better than others lower on the index.

source - Our World In Data

Click to show code
# Convert data to a time series object
tourism_ts <- tourism_data_zurich_philippines
#rename Date as date
names(tourism_ts)[names(tourism_ts) == "Date"] <- "date"
  
#read .csv with stringency index
stringency_df <- read.csv(here("data/stringency_index.csv"))

# Filter data by location
stringency_philippines <- filter(stringency_df, location == "Philippines")
stringency_switzerland <- filter(stringency_df, location == "Switzerland")

# Convert dates and set them to the first day of the month
stringency_philippines$date <- as.Date(format(dmy(stringency_philippines$date), "%Y-%m-01"))
stringency_switzerland$date <- as.Date(format(dmy(stringency_switzerland$date), "%Y-%m-01"))

# Aggregate to monthly average, ensuring date format is maintained
stringency_philippines <- stringency_philippines %>%
  group_by(date) %>%
  summarize(avg_stringency_index = mean(stringency_index, na.rm = TRUE))

stringency_switzerland <- stringency_switzerland %>%
  group_by(date) %>%
  summarize(avg_stringency_index = mean(stringency_index, na.rm = TRUE))

# Merge with Philippines data first
merged_data_philippines <- merge(tourism_ts, stringency_philippines, by = "date", all.x = TRUE)

# Then merge with Switzerland data
merged_data <- merge(merged_data_philippines, stringency_switzerland, by = "date", all.x = TRUE, suffixes = c("_PH", "_SW"))

# Replace NA values in avg_stringency_index with 0 if necessary
merged_data$avg_stringency_index_PH[is.na(merged_data$avg_stringency_index_PH)] <- 0
merged_data$avg_stringency_index_SW[is.na(merged_data$avg_stringency_index_SW)] <- 0


# Create a ggplot of the stringency index
ggplot(merged_data, aes(x = date, y = avg_stringency_index_PH, color = "Philippines")) +
  geom_line() +
  geom_line(aes(y = avg_stringency_index_SW, color = "Switzerland")) +
  labs(title = "Stringency Index in the Philippines and Switzerland",
       x = "Date",
       y = "Stringency Index") +
  scale_color_manual(values = c("#3C5B6F", "darkred"),
                     labels = c("Philippines", "Switzerland"))

#show merge data using reactable
reactable::reactable(merged_data)

5.2.2.1.2.2 Model Selection

Choose a forecasting model that can incorporate exogenous variables (covariates). Models such as:

  • Multiple Regression Analysis: Simple yet effective, if the relationships between the covariates and the dependent variable (tourist numbers) are linear.
Click to show code
# Fit a multiple regression model
model <- lm(value ~ avg_stringency_index_PH + avg_stringency_index_SW, data = merged_data)

# Summary of the model
summary(model)
#> 
#> Call:
#> lm(formula = value ~ avg_stringency_index_PH + avg_stringency_index_SW, 
#>     data = merged_data)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -318.9 -157.3  -69.3   93.7  873.7 
#> 
#> Coefficients:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               246.32      15.85   15.54   <2e-16 ***
#> avg_stringency_index_PH     5.87       2.55    2.30   0.0221 *  
#> avg_stringency_index_SW   -12.19       3.73   -3.26   0.0013 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 220 on 222 degrees of freedom
#> Multiple R-squared:  0.0931, Adjusted R-squared:  0.0849 
#> F-statistic: 11.4 on 2 and 222 DF,  p-value: 1.95e-05

# Forecast the next 24 months
forecast_values <- predict(model, newdata = merged_data)
  • ARIMAX (Autoregressive Integrated Moving Average with Exogenous Variables): Suitable for handling time series data with external influences.
Click to show code
#transform to a time series object frequency 12
tourism_ts <- ts(merged_data$value)  # Monthly data has frequency 12

# Prepare the exogenous regressors
exog_data <- cbind(merged_data$avg_stringency_index_PH, merged_data$avg_stringency_index_SW)

# Fit an ARIMAX model
model <- auto.arima(tourism_ts, xreg = exog_data, seasonal = TRUE)

# Summary of the model
summary(model)
#> Series: tourism_ts 
#> Regression with ARIMA(1,1,1) errors 
#> 
#> Coefficients:
#>         ar1     ma1  drift  xreg1  xreg2
#>       0.350  -0.930  3.295  -4.66  -2.83
#> s.e.  0.074   0.033  0.925   1.86   2.46
#> 
#> sigma^2 = 14509:  log likelihood = -1389
#> AIC=2790   AICc=2791   BIC=2811
#> 
#> Training set error measures:
#>                  ME RMSE  MAE  MPE MAPE  MASE    ACF1
#> Training set -0.808  119 73.9 1.57 81.6 0.934 0.00814


#forecast 24 months
forecast_values <- model %>%
  forecast(xreg = exog_data, h = 24)


# Plot the forecast along with the actual data
plot_arimax <- autoplot(forecast_values, series = "Forecast", alpha = 0.5) +
  autolayer(tourism_ts, series = "Actual Data", alpha = 0.8) +  # Ensure tourism_ts is correctly formatted as ts
  labs(title = "ARIMAX Forecast of Tourists from the Philippines to Zurich",
       x = "Date",
       y = "Number of Tourists") +
  guides(colour = guide_legend(title = "Data Type"))

print(tourism_ts)
#> Time Series:
#> Start = 1 
#> End = 225 
#> Frequency = 1 
#>   [1]   57   30   46   73   74   73   57   69   70  103   46   47
#>  [13]   49   29   55   81   82  113  115   87  100  134   90   52
#>  [25]   77   35   50   83  109   94   42   52  112   66   45   48
#>  [37]  122   55   58   82   76   58   46   53   58   70   35   45
#>  [49]   76   35   95   88   89   90  100   96  111   81   75   46
#>  [61]   52   66   83  127  164  125  106  105  139  219  110   70
#>  [73]  103   92   86  177  161  189  126  175  169  201  195   72
#>  [85]  120  131  138  208  286  191  110  123  227  247  145  110
#>  [97]  181   66  159  233  239  182  136  128  156  205  174  104
#> [109]   96   89  138  287  316  223  225  194  326  340  158  230
#> [121]  109  139  239  290  478  295  326  273  320  388  221  269
#> [133]  201  202  297  556  442  517  343  278  387  495  268  383
#> [145]  202  344  282  810  746  437  560  377  459  519  280  513
#> [157]  178  358  368  563  657  536  461  376  446  498  370  518
#> [169]  201  161  209  706  661  731  576  401  442  562  334  749
#> [181]  202  133   76    3    3    9   35   24   18   14   13   22
#> [193]   14    9   10   13   18   18   49   56   78   68   74  132
#> [205]   78   51   85  142  282  516  473  452  578  850  655 1159
#> [217]  466  344  537 1096 1071 1120 1036  586  827
print(forecast_values)
#>     Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
#> 226            789   634   943   553  1025
#> 227            778   610   945   521  1034
#> 228            776   605   947   515  1037
#> 229            777   605   950   514  1041
#> 230            780   607   953   515  1045
#> 231            783   609   957   517  1049
#> 232            786   611   961   519  1054
#> 233            790   614   965   521  1058
#> 234            793   616   969   523  1063
#> 235            796   619   974   525  1067
#> 236            799   621   978   527  1072
#> 237            803   624   982   529  1076
#> 238            806   626   986   531  1081
#> 239            809   629   990   533  1085
#> 240            813   631   994   535  1090
#> 241            816   634   998   537  1094
#> 242            819   636  1002   540  1099
#> 243            822   639  1006   542  1103
#> 244            826   641  1010   544  1108
#> 245            829   644  1014   546  1112
#> 246            832   646  1018   548  1117
#> 247            836   649  1022   550  1121
#> 248            839   652  1026   552  1125
#> 249            842   654  1030   555  1130
#> 250            846   657  1034   557  1134
#> 251            849   659  1038   559  1139
#> 252            852   662  1042   561  1143
#> 253            855   664  1046   563  1148
#> 254            859   667  1050   565  1152
#> 255            862   670  1055   568  1156
#> 256            865   672  1059   570  1161
#> 257            869   675  1063   572  1165
#> 258            872   677  1067   574  1170
#> 259            875   680  1071   576  1174
#> 260            878   682  1075   579  1178
#> 261            882   685  1079   581  1183
#> 262            885   688  1083   583  1187
#> 263            888   690  1087   585  1192
#> 264            892   693  1091   587  1196
#> 265            895   695  1095   590  1200
#> 266            898   698  1099   592  1205
#> 267            902   701  1103   594  1209
#> 268            905   703  1107   596  1213
#> 269            908   706  1111   599  1218
#> 270            911   708  1115   601  1222
#> 271            915   711  1119   603  1226
#> 272            918   714  1122   605  1231
#> 273            921   716  1126   608  1235
#> 274            925   719  1130   610  1239
#> 275            928   721  1134   612  1244
#> 276            931   724  1138   614  1248
#> 277            935   727  1142   617  1252
#> 278            938   729  1146   619  1257
#> 279            941   732  1150   621  1261
#> 280            944   735  1154   623  1265
#> 281            948   737  1158   626  1270
#> 282            951   740  1162   628  1274
#> 283            954   742  1166   630  1278
#> 284            958   745  1170   633  1283
#> 285            961   748  1174   635  1287
#> 286            964   750  1178   637  1291
#> 287            967   753  1182   639  1295
#> 288            971   756  1186   642  1300
#> 289            974   758  1190   644  1304
#> 290            977   761  1194   646  1308
#> 291            981   764  1198   649  1313
#> 292            984   766  1202   651  1317
#> 293            987   769  1206   653  1321
#> 294            991   772  1209   656  1325
#> 295            994   774  1213   658  1330
#> 296            997   777  1217   660  1334
#> 297           1000   780  1221   663  1338
#> 298           1004   782  1225   665  1342
#> 299           1007   785  1229   667  1347
#> 300           1010   788  1233   670  1351
#> 301           1014   790  1237   672  1355
#> 302           1017   793  1241   674  1359
#> 303           1020   796  1245   677  1364
#> 304           1023   798  1249   679  1368
#> 305           1027   801  1253   681  1372
#> 306           1030   804  1257   684  1376
#> 307           1033   806  1260   686  1381
#> 308           1037   809  1264   688  1385
#> 309           1040   812  1268   691  1389
#> 310           1043   814  1272   693  1393
#> 311           1047   817  1276   695  1398
#> 312           1050   820  1280   698  1402
#> 313           1053   822  1284   700  1406
#> 314           1056   825  1288   703  1410
#> 315           1060   828  1292   705  1415
#> 316           1063   830  1296   707  1419
#> 317           1066   833  1299   710  1423
#> 318           1070   836  1303   712  1427
#> 319           1073   839  1307   714  1431
#> 320           1076   841  1311   717  1436
#> 321           1079   844  1315   719  1440
#> 322           1083   847  1319   722  1444
#> 323           1086   849  1323   724  1448
#> 324           1089   852  1327   726  1452
#> 325           1093   855  1331   729  1457
#> 326           1096   857  1334   731  1461
#> 327           1099   860  1338   734  1465
#> 328           1103   863  1342   736  1469
#> 329           1106   866  1346   738  1473
#> 330           1109   868  1350   741  1477
#> 331           1112   871  1354   743  1482
#> 332           1116   874  1358   746  1486
#> 333           1119   876  1362   748  1490
#> 334           1122   879  1365   750  1494
#> 335           1126   882  1369   753  1498
#> 336           1129   885  1373   755  1503
#> 337           1132   887  1377   758  1507
#> 338           1136   890  1381   760  1511
#> 339           1139   893  1385   763  1515
#> 340           1142   896  1389   765  1519
#> 341           1145   898  1393   767  1523
#> 342           1149   901  1396   770  1528
#> 343           1152   904  1400   772  1532
#> 344           1155   906  1404   775  1536
#> 345           1159   909  1408   777  1540
#> 346           1162   912  1412   780  1544
#> 347           1165   915  1416   782  1548
#> 348           1168   917  1420   784  1552
#> 349           1172   920  1423   787  1557
#> 350           1175   923  1427   789  1561
#> 351           1178   926  1431   792  1565
#> 352           1182   928  1435   794  1569
#> 353           1185   931  1439   797  1573
#> 354           1188   934  1443   799  1577
#> 355           1192   937  1446   802  1581
#> 356           1195   939  1450   804  1586
#> 357           1198   942  1454   807  1590
#> 358           1201   945  1458   809  1594
#> 359           1205   948  1462   811  1598
#> 360           1208   950  1466   814  1602
#> 361           1211   953  1469   816  1606
#> 362           1215   956  1473   819  1610
#> 363           1218   959  1477   821  1614
#> 364           1221   961  1481   824  1619
#> 365           1224   964  1485   826  1623
#> 366           1228   967  1489   829  1627
#> 367           1231   970  1492   831  1631
#> 368           1234   972  1496   834  1635
#> 369           1238   975  1500   836  1639
#> 370           1241   978  1504   839  1643
#> 371           1244   981  1508   841  1647
#> 372           1248   983  1512   844  1651
#> 373           1251   986  1515   846  1656
#> 374           1254   989  1519   849  1660
#> 375           1257   992  1523   851  1664
#> 376           1261   995  1527   854  1668
#> 377           1264   997  1531   856  1672
#> 378           1267  1000  1535   859  1676
#> 379           1271  1003  1538   861  1680
#> 380           1274  1006  1542   864  1684
#> 381           1277  1008  1546   866  1688
#> 382           1280  1011  1550   869  1692
#> 383           1284  1014  1554   871  1696
#> 384           1287  1017  1557   874  1701
#> 385           1290  1020  1561   876  1705
#> 386           1294  1022  1565   879  1709
#> 387           1297  1025  1569   881  1713
#> 388           1300  1028  1573   884  1717
#> 389           1304  1031  1576   886  1721
#> 390           1307  1033  1580   889  1725
#> 391           1310  1036  1584   891  1729
#> 392           1313  1039  1588   894  1733
#> 393           1317  1042  1592   896  1737
#> 394           1320  1045  1596   899  1741
#> 395           1323  1047  1599   901  1745
#> 396           1327  1050  1603   904  1749
#> 397           1330  1053  1607   906  1754
#> 398           1333  1056  1611   909  1758
#> 399           1337  1059  1615   911  1762
#> 400           1340  1061  1618   914  1766
#> 401           1343  1064  1622   916  1770
#> 402           1346  1067  1626   919  1774
#> 403           1350  1070  1630   921  1778
#> 404           1353  1072  1633   924  1782
#> 405           1356  1075  1637   927  1786
#> 406           1343  1061  1624   912  1773
#> 407           1245   963  1527   814  1676
#> 408            924   642  1207   492  1356
#> 409            698   415   981   265  1130
#> 410            757   474  1041   324  1191
#> 411            883   599  1167   449  1317
#> 412            908   623  1192   473  1343
#> 413            910   625  1195   474  1346
#> 414            951   666  1237   515  1388
#> 415            977   691  1263   540  1414
#> 416            907   620  1193   469  1345
#> 417            917   630  1204   478  1356
#> 418            895   608  1182   456  1335
#> 419            895   608  1183   455  1336
#> 420            888   600  1177   447  1329
#> 421            886   597  1175   444  1327
#> 422            943   653  1232   500  1385
#> 423            937   648  1227   494  1381
#> 424            960   669  1250   516  1404
#> 425            936   645  1226   491  1380
#> 426            942   650  1233   496  1387
#> 427            930   639  1222   484  1376
#> 428            975   683  1268   529  1422
#> 429            983   690  1276   535  1430
#> 430            985   692  1278   537  1434
#> 431           1071   778  1365   622  1521
#> 432           1135   841  1429   685  1585
#> 433           1283   988  1577   832  1733
#> 434           1277   982  1572   825  1728
#> 435           1281   985  1576   829  1733
#> 436           1285   989  1581   832  1737
#> 437           1296  1000  1593   843  1750
#> 438           1308  1011  1605   854  1762
#> 439           1311  1014  1609   856  1766
#> 440           1337  1040  1635   882  1793
#> 441           1341  1042  1639   884  1797
#> 442           1478  1179  1777  1021  1935
#> 443           1481  1182  1781  1024  1939
#> 444           1485  1185  1785  1026  1943
#> 445           1488  1188  1788  1029  1947
#> 446           1491  1191  1792  1032  1951
#> 447           1495  1194  1796  1034  1955
#> 448           1498  1196  1800  1037  1959
#> 449           1501  1199  1803  1039  1963
#> 450           1505  1202  1807  1042  1967
print(exog_data)
#>         [,1]  [,2]
#>   [1,]   0.0  0.00
#>   [2,]   0.0  0.00
#>   [3,]   0.0  0.00
#>   [4,]   0.0  0.00
#>   [5,]   0.0  0.00
#>   [6,]   0.0  0.00
#>   [7,]   0.0  0.00
#>   [8,]   0.0  0.00
#>   [9,]   0.0  0.00
#>  [10,]   0.0  0.00
#>  [11,]   0.0  0.00
#>  [12,]   0.0  0.00
#>  [13,]   0.0  0.00
#>  [14,]   0.0  0.00
#>  [15,]   0.0  0.00
#>  [16,]   0.0  0.00
#>  [17,]   0.0  0.00
#>  [18,]   0.0  0.00
#>  [19,]   0.0  0.00
#>  [20,]   0.0  0.00
#>  [21,]   0.0  0.00
#>  [22,]   0.0  0.00
#>  [23,]   0.0  0.00
#>  [24,]   0.0  0.00
#>  [25,]   0.0  0.00
#>  [26,]   0.0  0.00
#>  [27,]   0.0  0.00
#>  [28,]   0.0  0.00
#>  [29,]   0.0  0.00
#>  [30,]   0.0  0.00
#>  [31,]   0.0  0.00
#>  [32,]   0.0  0.00
#>  [33,]   0.0  0.00
#>  [34,]   0.0  0.00
#>  [35,]   0.0  0.00
#>  [36,]   0.0  0.00
#>  [37,]   0.0  0.00
#>  [38,]   0.0  0.00
#>  [39,]   0.0  0.00
#>  [40,]   0.0  0.00
#>  [41,]   0.0  0.00
#>  [42,]   0.0  0.00
#>  [43,]   0.0  0.00
#>  [44,]   0.0  0.00
#>  [45,]   0.0  0.00
#>  [46,]   0.0  0.00
#>  [47,]   0.0  0.00
#>  [48,]   0.0  0.00
#>  [49,]   0.0  0.00
#>  [50,]   0.0  0.00
#>  [51,]   0.0  0.00
#>  [52,]   0.0  0.00
#>  [53,]   0.0  0.00
#>  [54,]   0.0  0.00
#>  [55,]   0.0  0.00
#>  [56,]   0.0  0.00
#>  [57,]   0.0  0.00
#>  [58,]   0.0  0.00
#>  [59,]   0.0  0.00
#>  [60,]   0.0  0.00
#>  [61,]   0.0  0.00
#>  [62,]   0.0  0.00
#>  [63,]   0.0  0.00
#>  [64,]   0.0  0.00
#>  [65,]   0.0  0.00
#>  [66,]   0.0  0.00
#>  [67,]   0.0  0.00
#>  [68,]   0.0  0.00
#>  [69,]   0.0  0.00
#>  [70,]   0.0  0.00
#>  [71,]   0.0  0.00
#>  [72,]   0.0  0.00
#>  [73,]   0.0  0.00
#>  [74,]   0.0  0.00
#>  [75,]   0.0  0.00
#>  [76,]   0.0  0.00
#>  [77,]   0.0  0.00
#>  [78,]   0.0  0.00
#>  [79,]   0.0  0.00
#>  [80,]   0.0  0.00
#>  [81,]   0.0  0.00
#>  [82,]   0.0  0.00
#>  [83,]   0.0  0.00
#>  [84,]   0.0  0.00
#>  [85,]   0.0  0.00
#>  [86,]   0.0  0.00
#>  [87,]   0.0  0.00
#>  [88,]   0.0  0.00
#>  [89,]   0.0  0.00
#>  [90,]   0.0  0.00
#>  [91,]   0.0  0.00
#>  [92,]   0.0  0.00
#>  [93,]   0.0  0.00
#>  [94,]   0.0  0.00
#>  [95,]   0.0  0.00
#>  [96,]   0.0  0.00
#>  [97,]   0.0  0.00
#>  [98,]   0.0  0.00
#>  [99,]   0.0  0.00
#> [100,]   0.0  0.00
#> [101,]   0.0  0.00
#> [102,]   0.0  0.00
#> [103,]   0.0  0.00
#> [104,]   0.0  0.00
#> [105,]   0.0  0.00
#> [106,]   0.0  0.00
#> [107,]   0.0  0.00
#> [108,]   0.0  0.00
#> [109,]   0.0  0.00
#> [110,]   0.0  0.00
#> [111,]   0.0  0.00
#> [112,]   0.0  0.00
#> [113,]   0.0  0.00
#> [114,]   0.0  0.00
#> [115,]   0.0  0.00
#> [116,]   0.0  0.00
#> [117,]   0.0  0.00
#> [118,]   0.0  0.00
#> [119,]   0.0  0.00
#> [120,]   0.0  0.00
#> [121,]   0.0  0.00
#> [122,]   0.0  0.00
#> [123,]   0.0  0.00
#> [124,]   0.0  0.00
#> [125,]   0.0  0.00
#> [126,]   0.0  0.00
#> [127,]   0.0  0.00
#> [128,]   0.0  0.00
#> [129,]   0.0  0.00
#> [130,]   0.0  0.00
#> [131,]   0.0  0.00
#> [132,]   0.0  0.00
#> [133,]   0.0  0.00
#> [134,]   0.0  0.00
#> [135,]   0.0  0.00
#> [136,]   0.0  0.00
#> [137,]   0.0  0.00
#> [138,]   0.0  0.00
#> [139,]   0.0  0.00
#> [140,]   0.0  0.00
#> [141,]   0.0  0.00
#> [142,]   0.0  0.00
#> [143,]   0.0  0.00
#> [144,]   0.0  0.00
#> [145,]   0.0  0.00
#> [146,]   0.0  0.00
#> [147,]   0.0  0.00
#> [148,]   0.0  0.00
#> [149,]   0.0  0.00
#> [150,]   0.0  0.00
#> [151,]   0.0  0.00
#> [152,]   0.0  0.00
#> [153,]   0.0  0.00
#> [154,]   0.0  0.00
#> [155,]   0.0  0.00
#> [156,]   0.0  0.00
#> [157,]   0.0  0.00
#> [158,]   0.0  0.00
#> [159,]   0.0  0.00
#> [160,]   0.0  0.00
#> [161,]   0.0  0.00
#> [162,]   0.0  0.00
#> [163,]   0.0  0.00
#> [164,]   0.0  0.00
#> [165,]   0.0  0.00
#> [166,]   0.0  0.00
#> [167,]   0.0  0.00
#> [168,]   0.0  0.00
#> [169,]   0.0  0.00
#> [170,]   0.0  0.00
#> [171,]   0.0  0.00
#> [172,]   0.0  0.00
#> [173,]   0.0  0.00
#> [174,]   0.0  0.00
#> [175,]   0.0  0.00
#> [176,]   0.0  0.00
#> [177,]   0.0  0.00
#> [178,]   0.0  0.00
#> [179,]   0.0  0.00
#> [180,]   0.0  0.00
#> [181,]   3.6  0.00
#> [182,]  23.8  2.39
#> [183,]  64.8 49.37
#> [184,] 100.0 72.66
#> [185,]  94.5 61.74
#> [186,]  78.8 44.51
#> [187,]  77.4 39.20
#> [188,]  75.2 43.06
#> [189,]  67.1 43.06
#> [190,]  68.3 33.18
#> [191,]  69.0 58.06
#> [192,]  66.8 59.20
#> [193,]  71.6 60.19
#> [194,]  72.2 60.19
#> [195,]  74.5 60.19
#> [196,]  78.0 56.49
#> [197,]  69.9 50.84
#> [198,]  73.7 47.53
#> [199,]  71.5 44.44
#> [200,]  77.4 44.44
#> [201,]  74.8 47.78
#> [202,]  76.6 50.00
#> [203,]  67.6 50.00
#> [204,]  66.1 51.08
#> [205,]  64.1 54.69
#> [206,]  58.0 35.45
#> [207,]  57.6 14.81
#> [208,]  28.8 11.11
#> [209,]  30.8 11.11
#> [210,]  30.6 11.11
#> [211,]  30.5 11.11
#> [212,]  30.5  8.25
#> [213,]  30.4  5.56
#> [214,]  30.3  5.56
#> [215,]  25.4  5.56
#> [216,]  25.4  5.56
#> [217,]   0.0  0.00
#> [218,]   0.0  0.00
#> [219,]   0.0  0.00
#> [220,]   0.0  0.00
#> [221,]   0.0  0.00
#> [222,]   0.0  0.00
#> [223,]   0.0  0.00
#> [224,]   0.0  0.00
#> [225,]   0.0  0.00
plot_arimax

  • State Space Models/Structural Time Series Model: These models are flexible in handling multiple covariates with varying impacts over time.
5.2.2.1.3 Incorporating Dummy Variables

Introduce dummy variables into your forecasting models to account for the impact of COVID-19. These variables can be set to 1 for the periods affected by the pandemic and 0 otherwise. This approach allows the model to differentiate the impact of COVID-19 from normal variations in the data.

5.2.2.2 Other ideas

  • Replacing covid values with the ARIMA, If the missing values are random or if excluding them would result in a loss of valuable information, we might consider imputing them. One common approach is to use statistical models like ARIMA to interpolate missing values based on the patterns observed in the available data.
  • Delete the timestamp of the covid period and use fill_gaps() to fill the missing values and then use the a model to predict the missing values.
  • If we use the ARIMA model, we can use covariates to account for the impact of COVID-19. By including a dummy variable that captures the effect of the pandemic, we can adjust the forecasts to reflect the changes in the data due to COVID-19.

6 Model Selection

Use Information Criteria for Model Comparison: Evaluate models based on criteria such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion), which help in selecting a model that balances goodness of fit with complexity.